Exploring Potential Influencers of Math Student Performance

Introduction

This project dives deep into a dataset named from the UCI Machine Learning Repository, which contains data on student performance in math programs at two Portuguese secondary schools. The data was collected using school reports and student questionaires and features academic, personal, and social aspects of a student’s life.

Why Is This Interesting?

This dataset is interesting because it shows how multiple aspects of a student’s life correlate to their scores in school. In addition, the features (G1, G2, G3) represent period 1-3 of a school year (similar to a midterm in the United States being period 1/2 and finals being 2/2.) This is interesting because you can see if a student’s score decreases over time (falling behind gradually due to external factors) or maintains a consistent grade throughout the year- these features also allow a ML model to predict G3 based off G1, G2, and the other external factors.

Objective

Our goal is to determine factors that have the most significant impact on student performance in the math program. In addition, we hope to discover factors which may seem influential based on intuition but prove otherwise after analyzing the data.

Categories Analyzed

Logistics & School Choice

  • traveltime — Home to school travel time (1 to 4 scale)
  • reason — Reason for choosing the school (home, reputation, course, other)

School & Demographics

  • Do students from different schools (GP vs MS) show significant differences in final performance (G3)?
  • Is there an interaction between romantic relationships and gender on academic performance?

Family Background

  • Do students with higher parental education (Medu/Fedu) perform better in math?
  • Does having family support at home (famsup) help students perform better in school?

Academic Support & Study Habits

  • Which factors (e.g., studytime, failures, goout, health) are the most predictive of final grades (G3)?
  • Does weekly study time affect performance consistently across all three grading periods (G1, G2, G3)?

Activities & Social Life

  • Do students who have a romantic relationship perform differently from those who don’t?
  • Is there a relationship between students’ participation in extracurricular activities and their number of absences?

Health & Lifestyle

  • What is the relationship between alcohol consumption (weekdays Dalc vs weekends Walc) and final grades?
  • Do students with internet access and more activities have more school absences than others?

Academic Performance

  • Do students who fail more subjects in the past continue to perform poorly in current grading periods?
  • How consistent are students’ performances across G1, G2, and G3? Is there improvement or decline over time?

Load Packages

We utilized the following libraries:

Data Cleaning

Using transmute(), we changed character columns to factors, and those with levels have been ordered. We used transmute instead of mutate so original columns would drop and there would be no redundant data. There are no missing values in the dataset which was confirmed with sum(is.na()).

The “DT” package is an R interface to the Javascript library “DataTables”. With help from the documentation, we enabled horizontal scrolling, auto column widths, and centered values within each column via a list of lists.

Exploratory Data Analysis

A quick way for us to explore the data was via correlation heatmap. Strong positive correlations are set to red, while strongly negative to blue. This plot was used as a starting point for the questions we wanted answered, as well as basic plots that could then benefit from a third degree of comparison (factors).

Intuitive relationships:

Some relationships to look into more:

Features not in the correlation heatmap: (Potential 3rd Degrees of Comparison)

Shikyna - School & Demographics

Shikyna - Family Background

Academic Support & Study Habits

Which factors are most predictive of final grades?

Minimal Impact: ∆ < 1
Factor Level Factor Avg Difference
school GP 10.49 0.11
school MS 9.62 -0.76
sex F 9.97 -0.41
sex M 10.83 0.45
address R 9.51 -0.87
address U 10.63 0.25
fam_size GT3 10.13 -0.25
fam_size LE3 11.00 0.62
parental_stat A 11.20 0.81
parental_stat T 10.29 -0.09
mom_edu 2 9.73 -0.65
mom_edu 3 10.30 -0.08
dad_edu 2 10.26 -0.12
dad_edu 3 10.66 0.28
dad_edu 4 11.19 0.81
mom_job other 9.82 -0.56
mom_job services 11.02 0.64
mom_job teacher 10.79 0.41
dad_job at_home 10.00 -0.38
dad_job other 10.18 -0.20
dad_job services 10.24 -0.14
attend_reason course 9.78 -0.60
attend_reason home 10.26 -0.12
attend_reason other 11.17 0.79
attend_reason reputation 11.07 0.68
guardian father 10.69 0.31
guardian mother 10.43 0.05
school_support no 10.52 0.14
school_support yes 9.43 -0.95
family_support no 10.55 0.17
family_support yes 10.27 -0.11
extra_paid_classes no 9.93 -0.45
extra_paid_classes yes 10.92 0.54
activities no 10.27 -0.11
activities yes 10.49 0.11
nursery_school no 9.81 -0.57
nursery_school yes 10.54 0.15
pursue_higher_edu yes 10.57 0.19
internet_use yes 10.62 0.24
romantic no 10.78 0.40
romantic yes 9.58 -0.81
family_relationship 1 10.62 0.24
family_relationship 2 9.89 -0.49
family_relationship 3 10.04 -0.34
family_relationship 4 10.36 -0.02
family_relationship 5 10.69 0.31
free_time 1 9.84 -0.54
free_time 3 9.71 -0.67
free_time 4 10.43 0.05
free_time 5 11.30 0.92
go_out_w_friends 1 9.87 -0.51
go_out_w_friends 2 11.04 0.66
go_out_w_friends 3 10.96 0.58
go_out_w_friends 4 9.65 -0.73
workday_alcohol 1 10.68 0.30
workday_alcohol 3 10.50 0.12
workday_alcohol 4 9.89 -0.49
workday_alcohol 5 10.67 0.29
weekend_alcohol 1 10.74 0.35
weekend_alcohol 2 9.94 -0.44
weekend_alcohol 3 10.72 0.34
weekend_alcohol 4 9.69 -0.69
weekend_alcohol 5 10.14 -0.24
health 2 10.22 -0.16
health 3 10.01 -0.37
health 4 9.93 -0.45
health 5 10.40 0.02
Significant Impact: ∆ ≥ 1.5
Factor Level Factor Avg Difference
mom_edu 0 13.00 2.62
mom_edu 1 8.68 -1.70
dad_edu 0 13.00 2.62
mom_job health 12.15 1.77
dad_job teacher 11.97 1.58
pursue_higher_edu no 6.80 -3.58


After calculating the average final grade, all factor columns ran through a custom function to compare the average final grade of students across different factor levels.

We then combined the results with bind_rows()- levels with averages < 1 point of the overall mean were considered to have minimal individual impact, while those that differed by ≥ 1.5 points were classified to have significant individual impact.



Left: Students’ average final grades increase with a higher combined parental education level.

Right: There is no clear relationship between final grades and either family relationship quality or family educational support.



Across all buckets of student study time, average final grades are marginally better for students with internet access at home.


Does weekly study time affect performance consistently across all terms?



Top: Shows the distribution of student grades across different weekly study time categories for each academic period. The red text notates the number of failures for those time periods. Students in groups 3 and 4 have moderately higher mean grades across all periods, but still have failures in the final period illustrating studying alone does not prevent poor grades.

Bottom Left: Shows the trending grades of students in each of the 4 study groups throughout the school year. The sharpest decline is group 4 from term 2 to the final- indicating either burnout or lack of information retention due to studying > 10 hours a week. The consistent decline across all 4 groups from term 1 to final indicated all the averages are being dragged down by failing students.

Bottom Right: Excludes failed students from average grade calculations. Now, all four groups show a consistent positive trend in averages with higher study times being associated with higher averages.

Activities and Social Life

Do romantic relationships affect student performance?



Left: Single students consistently achieve higher scores across all levels of time spend going out with friends. Among those who rarely go out with friends, those in a romantic relationship have a notable drop in grades- which suggests time spend dating detracts not only from time spent with friends, but also studying.

Right: Among single studets, those who often go out with friends have the lowest scores. In constrast, students in a romantic relationship have the lowest scores when rarely going out with friends, echoing the conclusion drawn from the previous plot.


Lauren - Health & Lifestyle

Lauren - Academic Performance

Conclusions

Limitations

Future Steps

References

Shamim, A. (n.d.). Math-Students Performance Data [Data set]. Kaggle. https://www.kaggle.com/datasets/adilshamim8/math-students/data

Dua, D., & Graff, C. (2017). Student Performance Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/dataset/320/student+performance

P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira (Eds.), Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), pp. 5-12, Porto, Portugal, April 2008, EUROSIS, ISBN 978-9077381-39-7.